A Clustering Approach for Achieving Data Privacy
نویسندگان
چکیده
New privacy regulations together with everincreasing data availability and computational power have created a huge interest in data privacy research. One major research direction is built around k-anonymity property and its extensions, which are required for the released data. In this paper we present such an extension to k-anonymity, called psensitive k-anonymity, which solves some of the weaknesses that the k-anonymity model has been shown to have. We also introduce a new algorithm for enforcing p-sensitive k-anonymity on microdata sets based on a greedy clustering approach. To limit the amount of information loss the proposed algorithm uses cell-level generalization for categorical attributes and hierarchy-free generalization for numerical attributes. Our belief is that the above mentioned algorithm can be adjusted and used to enforce other similar privacy models as well, with better results than the algorithms originally proposed along with these models. Our experiments show that the proposed algorithm efficiently generates the masked microdata with psensitive k-anonymity property.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملA Hybrid Privacy Preserving Approach in Data Mining
Data mining algorithms extracts the unknown interesting patterns from large collection of data set. Some clandestine or secret information may be exposed as part of the data mining process. In this paper we put forward a hybrid approach for achieving privacy during the mining procedure. The first step is to sanitize the original data using a geometrical data transformation. In the second stage ...
متن کاملRepeated Record Ordering for Constrained Size Clustering
One of the main techniques used in data mining is data clustering, which has many applications in computer science, biology, and social sciences. Constrained clustering is a type of clustering in which side information provided by the user is incorporated into current clustering algorithms. One of the well researched constrained clustering algorithms is called microaggregation. In a microaggreg...
متن کاملPrivacy Preserving Distributed K-Means Clustering in Malicious Model Using Zero Knowledge Proof
Preserving Privacy is crucial in distributed environments wherein data mining becomes a collaborative task among participants. Critical applications in distributed environment demand higher level of privacy with lesser overheads. Solutions proposed on the lines of cryptography provide higher level of privacy but poor scalability due to higher overheads. Further, existing cryptography based solu...
متن کاملA centralized privacy-preserving framework for online social networks
There are some critical privacy concerns in the current online social networks (OSNs). Users' information is disclosed to different entities that they were not supposed to access. Furthermore, the notion of friendship is inadequate in OSNs since the degree of social relationships between users dynamically changes over the time. Additionally, users may define similar privacy settings for their f...
متن کامل